Introduction

This article describes creating an OCCDS ADaM. Examples are currently presented and tested in the context of ADAE. However, the examples could be applied to other OCCDS ADaMs such as ADCM, ADMH, ADDV, etc.

Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.

Programming Workflow

Read in Data

To start, all data frames needed for the creation of ADAE should be read into the environment. This will be a company specific process. Some of the data frames needed may be AE, ADSL, SUPPAE, ’SUPPDM`.

For example purpose, the CDISC Pilot SDTM and ADaM datasets—which are included in {admiral}—are used.

library(admiral)
library(dplyr)
library(admiral.test)
library(lubridate)

data("ae")
data("suppae")
data("adsl")

The SUPPAE domain can be joined to the ’AEdomain using the functionderive_vars_suppqual()`.

This function will transpose the supplemental SDTM domain (e.g. SUPPAE) and join the transposed data to the parent domain (e.g. ae) by STUDYID, USUBJID using the IDVAR and IDVARVAL as an additional join variable.

Example call:

To derive Supplemental Qualifiers, derive_vars_suppqual() can be used.

ae <- derive_vars_suppqual(ae, suppae)

At this step, it may be useful to join ADSL to your AE domain. Only the ADSL variables used for derivations are selected at this step. The rest of the relevant ADSL would be added later.

adsl_vars <- vars(TRTSDT, TRTEDT, TRT01A, TRT01P, DTHDT, EOSDT)

adae <- left_join(
  ae,
  select(adsl, STUDYID, USUBJID, !!!adsl_vars),
  by = c("STUDYID", "USUBJID")
)

## Derive/Impute End and Start Analysis Date/time and Relative Day {#datetime}

This part derives ASTDTM, ASTDT, ASTDY, AENDTM, AENDT, and AENDY. The function derive_vars_dtm() can be used to derive ASTDTM and AENDTM where ASTDTM could be company-specific. ASTDT and AENDT can be derived from ASTDTM and AENDTM, respectively using function derive_vars_dtm_to_dt. derive_var_astdy() and derive_var_aendy() can be used to create ASTDY and AENDY, respectively.

adae <- adae %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    date_imputation = "first",
    time_imputation = "first",
    min_dates = vars(TRTSDT)
  ) %>%
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = vars(DTHDT, EOSDT)
  ) %>%
  derive_vars_dtm_to_dt(vars(ASTDTM, AENDTM)
  ) %>%
  derive_var_astdy(
    reference_date = TRTSDT,
    date = ASTDT
  ) %>%
  derive_var_aendy(
    reference_date = TRTSDT,
    date = AENDT
  )

See also Date and Time Imputation.

Derive Durations

The function derive_vars_duration() can be used to create the variables ADURN and ADURU.

adae <- adae %>%
  derive_vars_duration(
    new_var = ADURN,
    new_var_unit = ADURU,
    start_date = ASTDT,
    end_date = AENDT
  )

Derive ATC variables

The function derive_vars_atc() can be used to derive ATC Class Variables.

It helps to add Anatomical Therapeutic Chemical class variables from FACM to ADCM.

The expected result is the input dataset with ATC variables added.

cm <- tibble::tribble(
         ~USUBJID, ~CMGRPID,  ~CMREFID,            ~CMDECOD,
   "BP40257-1001",     "14", "1192056",       "PARACETAMOL",
   "BP40257-1001",     "18", "2007001",        "SOLUMEDROL",
   "BP40257-1002",     "19", "2791596",    "SPIRONOLACTONE"
 )
 facm <- tibble::tribble(
         ~USUBJID, ~FAGRPID,  ~FAREFID,   ~FATESTCD, ~FASTRESC,
   "BP40257-1001",      "1", "1192056",  "CMATC1CD",       "N",
   "BP40257-1001",      "1", "1192056",  "CMATC2CD",     "N02",
   "BP40257-1001",      "1", "1192056",  "CMATC3CD",    "N02B",
   "BP40257-1001",      "1", "1192056",  "CMATC4CD",   "N02BE",

   "BP40257-1001",      "1", "2007001",  "CMATC1CD",       "D",
   "BP40257-1001",      "1", "2007001",  "CMATC2CD",     "D10",
   "BP40257-1001",      "1", "2007001",  "CMATC3CD",    "D10A",
   "BP40257-1001",      "1", "2007001",  "CMATC4CD",   "D10AA",
   "BP40257-1001",      "2", "2007001",  "CMATC1CD",       "D",
   "BP40257-1001",      "2", "2007001",  "CMATC2CD",     "D07",
   "BP40257-1001",      "2", "2007001",  "CMATC3CD",    "D07A",
   "BP40257-1001",      "2", "2007001",  "CMATC4CD",   "D07AA",
   "BP40257-1001",      "3", "2007001",  "CMATC1CD",       "H",
   "BP40257-1001",      "3", "2007001",  "CMATC2CD",     "H02",
   "BP40257-1001",      "3", "2007001",  "CMATC3CD",    "H02A",
   "BP40257-1001",      "3", "2007001",  "CMATC4CD",   "H02AB",

   "BP40257-1002",      "1", "2791596",  "CMATC1CD",       "C",
   "BP40257-1002",      "1", "2791596",  "CMATC2CD",     "C03",
   "BP40257-1002",      "1", "2791596",  "CMATC3CD",    "C03D",
   "BP40257-1002",      "1", "2791596",  "CMATC4CD",   "C03DA"
 )

derive_vars_atc(cm, facm)
#> # A tibble: 5 x 8
#>   USUBJID      CMGRPID CMREFID CMDECOD        ATC1CD ATC2CD ATC3CD ATC4CD
#>   <chr>        <chr>   <chr>   <chr>          <chr>  <chr>  <chr>  <chr> 
#> 1 BP40257-1001 14      1192056 PARACETAMOL    N      N02    N02B   N02BE 
#> 2 BP40257-1001 18      2007001 SOLUMEDROL     D      D10    D10A   D10AA 
#> 3 BP40257-1001 18      2007001 SOLUMEDROL     D      D07    D07A   D07AA 
#> 4 BP40257-1001 18      2007001 SOLUMEDROL     H      H02    H02A   H02AB 
#> 5 BP40257-1002 19      2791596 SPIRONOLACTONE C      C03    C03D   C03DA

Derive Planned and Actual Treatment

TRTA and TRTP must correlate to treatment TRTxxP and/or TRTxxA in ADSL. The derivation of TRTA and TRTP for a record are protocol and analysis specific. admiral does not currently have functionality to assist with TRTA and TRTP assignment.

However, an example of a simple implementation could be:

adae <- mutate(adae, TRTP = TRT01P, TRTA = TRT01A)

count(adae, TRTP, TRTA, TRT01P, TRT01A)
#> # A tibble: 3 x 5
#>   TRTP   TRTA   TRT01P TRT01A     n
#>   <chr>  <chr>  <chr>  <chr>  <int>
#> 1 Pbo    Pbo    Pbo    Pbo      301
#> 2 Xan_Hi Xan_Hi Xan_Hi Xan_Hi   455
#> 3 Xan_Lo Xan_Lo Xan_Lo Xan_Lo   435

Derive Date/Date-time of Last Dose

The function derive_last_dose() can be used to derive the last dose date before the start of the event. Additionally, this function can also provide the traceability variables (e.g. LDOSEDOM, LDOSESEQ) using the traceability_vars argument.

data(ex_single)
adae <- adae %>%
  derive_last_dose(
    ex_single,
    filter_ex = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) &
      nchar(EXENDTC) >= 10,
    dose_start = EXSTDTC,
    dose_end = EXENDTC,
    analysis_date = ASTDT,
    dataset_seq_var = AESEQ,
    new_var = LDOSEDTM,
    output_datetime = TRUE,
    check_dates_only = FALSE
  )

Derive Severity, Causality, and Toxicity Grade

The variables ASEV, AREL, and ATOXGR can be added by simply mutate() if no imputation is required.

adae <- adae %>%
  mutate(
    ASEV = AESEV,
    AREL = AEREL
  )

Derive Treatment Emergent Flag

To derive the treatment emergent flag TRTEMFL, one can use simple dplyr::mutate(). In the example below, we use 30 days in the flag derivation.

adae <- adae %>%
  mutate(
    TRTEMFL = ifelse(ASTDT >= TRTSDT & ASTDT <= TRTEDT + days(30), "Y", NA_character_)
  )

To derive on-treatment flag (ONTRTFL) in an ADaM dataset with a single assessment date, we use derive_var_ontrtfl().

The expected result is the input dataset with an additional column named ONTRTFL with a value of "Y" or NA.

bds1 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-02-24"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-01-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2019-12-31"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds1,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT
)
#> # A tibble: 3 x 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-02-24 2020-01-01 2020-03-01 Y      
#> 2 P02     2020-01-01 2020-01-01 2020-03-01 Y      
#> 3 P03     2019-12-31 2020-01-01 2020-03-01 <NA>

bds2 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-07-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-04-30"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2020-03-15"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds2,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT,
  ref_end_window = 60
)
#> # A tibble: 3 x 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-07-01 2020-01-01 2020-03-01 <NA>   
#> 2 P02     2020-04-30 2020-01-01 2020-03-01 Y      
#> 3 P03     2020-03-15 2020-01-01 2020-03-01 Y

bds3 <- tibble::tribble(
  ~ADTM,              ~TRTSDTM,           ~TRTEDTM,           ~TPT,
  "2020-01-02T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA,
  "2020-01-01T12:00", "2020-01-01T12:00", "2020-03-01T12:00", "PRE",
  "2019-12-31T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA
) %>%
 mutate(
  ADTM = ymd_hm(ADTM),
  TRTSDTM = ymd_hm(TRTSDTM),
  TRTEDTM = ymd_hm(TRTEDTM)
 )
derive_var_ontrtfl(
  bds3,
  start_date = ADTM,
  ref_start_date = TRTSDTM,
  ref_end_date = TRTEDTM,
  filter_pre_timepoint = TPT == "PRE"
)
#> # A tibble: 3 x 5
#>   ADTM                TRTSDTM             TRTEDTM             TPT   ONTRTFL
#>   <dttm>              <dttm>              <dttm>              <chr> <chr>  
#> 1 2020-01-02 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  Y      
#> 2 2020-01-01 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 PRE   <NA>   
#> 3 2019-12-31 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  <NA>

Derive Occurrence Flags

The function derive_extreme_flag() can help derive variables such as AOCCIFL, AOCCPIFL, AOCCSIFL, AOCXIFL, AOCXPIFL, and AOCXSIFL.

If grades were collected, the following can be used to flag first occurrence of maximum toxicity grade.

adae <- adae %>%
  derive_extreme_flag(
    by_vars = vars(USUBJID),
    order = vars(desc(ATOXGR), ASTDTM, AESEQ),
    new_var = AOCCIFL,
    filter = TRTEMFL == "Y",
    mode = "first"
  )

Similarly, ASEV can also be used to derive the occurrence flags if severity is collected. In this case, the variable may need to be firstly recorded into a numeric one. Flag first occurence of most severe adverse event:

adae <- adae %>%
  mutate(
    ASEVN = as.integer(factor(ASEV, levels = c("MILD", "MODERATE", "SEVERE", "DEATH THREATENING")))
  ) %>%
  derive_extreme_flag(
    by_vars = vars(USUBJID),
    order = vars(desc(ASEVN), ASTDTM, AESEQ),
    new_var = AOCCIFL,
    filter = TRTEMFL == "Y",
    mode = "first"
  )

Derive Query Variables

It is necessary for the dictionary query information to be passed into this function in a particular format which is detailed in derive_vars_query() to an ADaM.

For example, in ADAE, MedDRA SMQs and/or Customized Query variables may be needed.

This function expects the dictionary and/or lookup information to be provided as input in a standard structure.

The expected result is the input dataset with query variables added: See also Queries dataset documentation.

data("queries")
adae1 <- tibble::tribble(
  ~USUBJID, ~ASTDTM, ~AETERM, ~AESEQ, ~AEDECOD, ~AELLT, ~AELLTCD,
  "01", "2020-06-02 23:59:59", "ALANINE AMINOTRANSFERASE ABNORMAL",
    3, "Alanine aminotransferase abnormal", NA_character_, NA_integer_,
  "02", "2020-06-05 23:59:59", "BASEDOW'S DISEASE",
    5, "Basedow's disease", NA_character_, 1L,
  "03", "2020-06-07 23:59:59", "SOME TERM",
    2, "Some query", "Some term", NA_integer_,
  "05", "2020-06-09 23:59:59", "ALVEOLAR PROTEINOSIS",
    7, "Alveolar proteinosis", NA_character_,  NA_integer_
)

adae_query <- derive_vars_query(dataset = adae1 , dataset_queries = queries)

Similarly to SMQ, the derive_vars_query() function can be used to derive Standardized Drug Groupings (SDG).

sdg <- tibble::tribble(
  ~VAR_PREFIX, ~QUERY_NAME,       ~SDG_ID, ~QUERY_SCOPE, ~QUERY_SCOPE_NUM, ~TERM_LEVEL, ~TERM_NAME,         ~TERM_ID,
  "SDG01",     "Diuretics",       11,      "BROAD",      1,                "CMDECOD",   "Diuretic 1",       NA,
  "SDG01",     "Diuretics",       11,      "BROAD",      2,                "CMDECOD",   "Diuretic 2",       NA,
  "SDG02",     "Costicosteroids", 12,      "BROAD",      1,                "CMDECOD",   "Costicosteroid 1", NA,
  "SDG02",     "Costicosteroids", 12,      "BROAD",      2,                "CMDECOD",   "Costicosteroid 2", NA,
  "SDG02",     "Costicosteroids", 12,      "BROAD",      2,                "CMDECOD",   "Costicosteroid 3", NA,
)
adcm <- tibble::tribble(
  ~USUBJID, ~ASTDTM,               ~CMDECOD,
  "01",     "2020-06-02 23:59:59", "Diuretic 1",
  "02",     "2020-06-05 23:59:59", "Diuretic 1",
  "03",     "2020-06-07 23:59:59", "Costicosteroid 2",
  "05",     "2020-06-09 23:59:59", "Diuretic 2"
)
adcm_query <- derive_vars_query(adcm, sdg)

Add the ADSL variables

If needed, the other ADSL variables can now be added:

adae <- adae %>%
  left_join(select(adsl, !!!admiral:::negate_vars(adsl_vars)),
            by = c("STUDYID", "USUBJID")
  )
#> Warning: Column `STUDYID` has different attributes on LHS and RHS of join
#> Warning: Column `USUBJID` has different attributes on LHS and RHS of join

Example Scripts

ADaM Sample Code
ADAE ad_adae.R
ADCM ad_adcm.R